AITopics | error rate control

AutoMS: Automatic Model Selection for Novelty Detection with Error Rate Control

Neural Information Processing SystemsDec-24-2025, 14:12:05 GMT

Given an unsupervised novelty detection task on a new dataset, how can we automatically select a ''best'' detection model while simultaneously controlling the error rate of the best model? For novelty detection analysis, numerous detectors have been proposed to detect outliers on a new unseen dataset based on a score function trained on available clean data. However, due to the absence of labeled data for model evaluation and comparison, there is a lack of systematic approaches that are able to select a ''best'' model/detector (i.e., the algorithm as well as its hyperparameters) and achieve certain error rate control simultaneously. In this paper, we introduce a unified data-driven procedure to address this issue. The key idea is to maximize the number of detected outliers while controlling the false discovery rate (FDR) with the help of Jackknife prediction. We establish non-asymptotic bounds for the false discovery proportions and show that the proposed procedure yields valid FDR control under some mild conditions. Numerical experiments on both synthetic and real data validate the theoretical results and demonstrate the effectiveness of our proposed AutoMS method.

automatic model selection, name change, novelty detection, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

7dced224614f3c50b1626473f48312bf-Paper-Conference.pdf

Neural Information Processing SystemsNov-15-2025, 06:21:36 GMT

automs, detector, fdr control, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Oregon (0.04)
(2 more...)

Genre: Research Report (0.95)

Industry:

Law Enforcement & Public Safety (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.56)

Add feedback

Statistical Inference Leveraging Synthetic Data with Distribution-Free Guarantees

Bashari, Meshi, Lee, Yonghoon, Lotan, Roy Maor, Dobriban, Edgar, Romano, Yaniv

arXiv.org Machine LearningSep-25-2025

The rapid proliferation of high-quality synthetic data -- generated by advanced AI models or collected as auxiliary data from related tasks -- presents both opportunities and challenges for statistical inference. This paper introduces a GEneral Synthetic-Powered Inference (GESPI) framework that wraps around any statistical inference procedure to safely enhance sample efficiency by combining synthetic and real data. Our framework leverages high-quality synthetic data to boost statistical power, yet adaptively defaults to the standard inference method using only real data when synthetic data is of low quality. The error of our method remains below a user-specified bound without any distributional assumptions on the synthetic data, and decreases as the quality of the synthetic data improves. This flexibility enables seamless integration with conformal prediction, risk control, hypothesis testing, and multiple testing procedures, all without modifying the base inference method. We demonstrate the benefits of our method on challenging tasks with limited labeled data, including AlphaFold protein structure prediction, and comparing large reasoning models on complex math problems.

alg, prediction, synthetic data, (16 more...)

arXiv.org Machine Learning

2509.20345

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.68)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

AutoMS: Automatic Model Selection for Novelty Detection with Error Rate Control

Neural Information Processing SystemsAug-16-2025, 09:09:06 GMT

Given an unsupervised novelty detection task on a new dataset, how can we automatically select a "best" detection model while simultaneously controlling the

automs, detector, fdr control, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Oregon (0.04)
(2 more...)

Genre: Research Report (0.95)

Industry:

Law Enforcement & Public Safety (0.46)
Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.52)

Add feedback

AutoMS: Automatic Model Selection for Novelty Detection with Error Rate Control

Neural Information Processing SystemsJan-15-2025, 05:30:49 GMT

Given an unsupervised novelty detection task on a new dataset, how can we automatically select a ''best'' detection model while simultaneously controlling the error rate of the best model? For novelty detection analysis, numerous detectors have been proposed to detect outliers on a new unseen dataset based on a score function trained on available clean data. However, due to the absence of labeled data for model evaluation and comparison, there is a lack of systematic approaches that are able to select a ''best'' model/detector (i.e., the algorithm as well as its hyperparameters) and achieve certain error rate control simultaneously. In this paper, we introduce a unified data-driven procedure to address this issue. The key idea is to maximize the number of detected outliers while controlling the false discovery rate (FDR) with the help of Jackknife prediction. We establish non-asymptotic bounds for the false discovery proportions and show that the proposed procedure yields valid FDR control under some mild conditions.

automatic model selection, error rate control, novelty detection, (4 more...)

Neural Information Processing Systems

Technology: